We address the problem of localisation of objects as bounding boxes in imagesand videos with weak labels. This weakly supervised object localisation problemhas been tackled in the past using discriminative models where each objectclass is localised independently from other classes. In this paper, a novelframework based on Bayesian joint topic modelling is proposed, which differssignificantly from the existing ones in that: (1) All foreground object classesare modelled jointly in a single generative model that encodes multiple objectco-existence so that "explaining away" inference can resolve ambiguity and leadto better learning and localisation. (2) Image backgrounds are shared acrossclasses to better learn varying surroundings and "push out" objects ofinterest. (3) Our model can be learned with a mixture of weakly labelled andunlabelled data, allowing the large volume of unlabelled images on the Internetto be exploited for learning. Moreover, the Bayesian formulation enables theexploitation of various types of prior knowledge to compensate for the limitedsupervision offered by weakly labelled data, as well as Bayesian domainadaptation for transfer learning. Extensive experiments on the PASCAL VOC,ImageNet and YouTube-Object videos datasets demonstrate the effectiveness ofour Bayesian joint model for weakly supervised object localisation.
展开▼